In [1]:
import graphlab
In [2]:
song_data = graphlab.SFrame('D:/coursera/song_data.gl/')
This non-commercial license of GraphLab Create is assigned to 373620277@qq.com and will expire on June 24, 2017. For commercial licensing options, visit https://dato.com/buy/.
[INFO] graphlab.cython.cy_server: GraphLab Create v1.10.1 started. Logging: C:\Users\dlnu\AppData\Local\Temp\graphlab_server_1466857020.log.0
In [3]:
song_data
Out[3]:
user_id song_id listen_count title artist
b80344d063b5ccb3212f76538
f3d9e43d87dca9e ...
SOAKIMP12A8C130995 1 The Cove Jack Johnson
b80344d063b5ccb3212f76538
f3d9e43d87dca9e ...
SOBBMDR12A8C13253B 2 Entre Dos Aguas Paco De Lucia
b80344d063b5ccb3212f76538
f3d9e43d87dca9e ...
SOBXHDL12A81C204C0 1 Stronger Kanye West
b80344d063b5ccb3212f76538
f3d9e43d87dca9e ...
SOBYHAJ12A6701BF1D 1 Constellations Jack Johnson
b80344d063b5ccb3212f76538
f3d9e43d87dca9e ...
SODACBL12A8C13C273 1 Learn To Fly Foo Fighters
b80344d063b5ccb3212f76538
f3d9e43d87dca9e ...
SODDNQT12A6D4F5F7E 5 Apuesta Por El Rock 'N'
Roll ...
Héroes del Silencio
b80344d063b5ccb3212f76538
f3d9e43d87dca9e ...
SODXRTY12AB0180F3B 1 Paper Gangsta Lady GaGa
b80344d063b5ccb3212f76538
f3d9e43d87dca9e ...
SOFGUAY12AB017B0A8 1 Stacked Actors Foo Fighters
b80344d063b5ccb3212f76538
f3d9e43d87dca9e ...
SOFRQTD12A81C233C0 1 Sehr kosmisch Harmonia
b80344d063b5ccb3212f76538
f3d9e43d87dca9e ...
SOHQWYZ12A6D4FA701 1 Heaven's gonna burn your
eyes ...
Thievery Corporation
feat. Emiliana Torrini ...
song
The Cove - Jack Johnson
Entre Dos Aguas - Paco De
Lucia ...
Stronger - Kanye West
Constellations - Jack
Johnson ...
Learn To Fly - Foo
Fighters ...
Apuesta Por El Rock 'N'
Roll - Héroes del ...
Paper Gangsta - Lady GaGa
Stacked Actors - Foo
Fighters ...
Sehr kosmisch - Harmonia
Heaven's gonna burn your
eyes - Thievery ...
[1116609 rows x 6 columns]
Note: Only the head of the SFrame is printed.
You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.

看看有多少首个多少个用户

In [9]:
songs = song_data['song_id'].unique()
users = song_data['user_id'].unique()
print 'num_song: ', len(songs)
print 'num_user: ', len(users)
num_song:  10000
num_user:  66346

歌曲热度排名

In [6]:
graphlab.canvas.set_target('ipynb')
song_data['song'].show()

按歌曲流行度推荐

In [7]:
train_data, test_data = song_data.random_split(0.8, seed=0)
In [8]:
popularity_model = graphlab.popularity_recommender.create(train_data,
                                                         user_id = 'user_id',
                                                         item_id = 'song')
Recsys training: model = popularity
Warning: Ignoring columns song_id, listen_count, title, artist;
    To use one of these as a target column, set target = 
    and use a method that allows the use of a target.
Preparing data set.
    Data has 893580 observations with 66085 users and 9952 items.
    Data prepared in: 1.02s
893580 observations to process; with 9952 unique items.
In [11]:
popularity_model.recommend(users = [users[0]]) #给第一个用户做推荐
Out[11]:
user_id song score rank
c66c10a9567f0d82ff31441a9
fd5063e5cd9dfe8 ...
Sehr kosmisch - Harmonia 4754.0 1
c66c10a9567f0d82ff31441a9
fd5063e5cd9dfe8 ...
Undo - Björk 4227.0 2
c66c10a9567f0d82ff31441a9
fd5063e5cd9dfe8 ...
You're The One - Dwight
Yoakam ...
3781.0 3
c66c10a9567f0d82ff31441a9
fd5063e5cd9dfe8 ...
Dog Days Are Over (Radio
Edit) - Florence + The ...
3633.0 4
c66c10a9567f0d82ff31441a9
fd5063e5cd9dfe8 ...
Revelry - Kings Of Leon 3527.0 5
c66c10a9567f0d82ff31441a9
fd5063e5cd9dfe8 ...
Horn Concerto No. 4 in E
flat K495: II. Romance ...
3161.0 6
c66c10a9567f0d82ff31441a9
fd5063e5cd9dfe8 ...
Secrets - OneRepublic 3148.0 7
c66c10a9567f0d82ff31441a9
fd5063e5cd9dfe8 ...
Fireflies - Charttraxx
Karaoke ...
2532.0 8
c66c10a9567f0d82ff31441a9
fd5063e5cd9dfe8 ...
Tive Sim - Cartola 2521.0 9
c66c10a9567f0d82ff31441a9
fd5063e5cd9dfe8 ...
Drop The World - Lil
Wayne / Eminem ...
2053.0 10
[10 rows x 4 columns]

基于用户的推荐

In [12]:
personalized_model = graphlab.item_similarity_recommender.create(train_data,
                                                                user_id = 'user_id',
                                                                item_id = 'song')
Recsys training: model = item_similarity
Warning: Ignoring columns song_id, listen_count, title, artist;
    To use one of these as a target column, set target = 
    and use a method that allows the use of a target.
Preparing data set.
    Data has 893580 observations with 66085 users and 9952 items.
    Data prepared in: 1.074s
Training model from provided data.
Gathering per-item and per-user statistics.
+--------------------------------+------------+
| Elapsed Time (Item Statistics) | % Complete |
+--------------------------------+------------+
| 1ms                            | 1.5        |
| 33ms                           | 100        |
+--------------------------------+------------+
Setting up lookup tables.
Processing data in one pass using dense lookup tables.
+-------------------------------------+------------------+-----------------+
| Elapsed Time (Constructing Lookups) | Total % Complete | Items Processed |
+-------------------------------------+------------------+-----------------+
| 255ms                               | 0                | 0               |
| 1.51s                               | 100              | 9952            |
+-------------------------------------+------------------+-----------------+
Finalizing lookup tables.
Generating candidate set for working with new users.
Finished training in 2.586s
In [14]:
personalized_model.recommend(users = [users[0]])
Out[14]:
user_id song score rank
c66c10a9567f0d82ff31441a9
fd5063e5cd9dfe8 ...
Cuando Pase El Temblor -
Soda Stereo ...
0.0194504536115 1
c66c10a9567f0d82ff31441a9
fd5063e5cd9dfe8 ...
Fireflies - Charttraxx
Karaoke ...
0.0144737317012 2
c66c10a9567f0d82ff31441a9
fd5063e5cd9dfe8 ...
Love Is A Losing Game -
Amy Winehouse ...
0.0142865960415 3
c66c10a9567f0d82ff31441a9
fd5063e5cd9dfe8 ...
Marry Me - Train 0.014133471709 4
c66c10a9567f0d82ff31441a9
fd5063e5cd9dfe8 ...
Secrets - OneRepublic 0.013591665488 5
c66c10a9567f0d82ff31441a9
fd5063e5cd9dfe8 ...
Sehr kosmisch - Harmonia 0.0133987894425 6
c66c10a9567f0d82ff31441a9
fd5063e5cd9dfe8 ...
Te Hacen Falta Vitaminas
- Soda Stereo ...
0.0129302831796 7
c66c10a9567f0d82ff31441a9
fd5063e5cd9dfe8 ...
OMG - Usher featuring
will.i.am ...
0.0127778282532 8
c66c10a9567f0d82ff31441a9
fd5063e5cd9dfe8 ...
Y solo se me ocurre
amarte (Unplugged) - ...
0.0123411279458 9
c66c10a9567f0d82ff31441a9
fd5063e5cd9dfe8 ...
No Dejes Que... -
Caifanes ...
0.0121042499175 10
[10 rows x 4 columns]

查看歌曲间的相似度

In [21]:
personalized_model.get_similar_items([song_data['song'][0]])
Getting similar items completed in 0.001
Out[21]:
song similar score rank
The Cove - Jack Johnson Moonshine - Jack Johnson 0.179487168789 1
The Cove - Jack Johnson Holes To Heaven - Jack
Johnson ...
0.11196911335 2
The Cove - Jack Johnson Country Road - Jack
Johnson / Paula Fuga ...
0.0839694738388 3
The Cove - Jack Johnson Supposed To Be - Jack
Johnson ...
0.0740740895271 4
The Cove - Jack Johnson Let It Be Sung - Jack
Johnson / Matt Costa / ...
0.0736842155457 5
The Cove - Jack Johnson Wrong Turn - Jack Johnson 0.07096773386 6
The Cove - Jack Johnson Questions - Jack Johnson 0.0687022805214 7
The Cove - Jack Johnson Rainbow - Jack Johnson /
G. Love ...
0.0649350881577 8
The Cove - Jack Johnson Posters - Jack Johnson 0.059360742569 9
The Cove - Jack Johnson If I Could - Jack Johnson 0.0588235259056 10
[10 rows x 4 columns]
In [23]:
personalized_model.get_similar_items([song_data['song'][-1]])
Getting similar items completed in 0.002
Out[23]:
song similar score rank
Keep Away - Godsmack Voodoo - Godsmack 0.142156839371 1
Keep Away - Godsmack Moon Baby - Godsmack 0.135135114193 2
Keep Away - Godsmack Straight Out Of Line -
Godsmack ...
0.134453773499 3
Keep Away - Godsmack Serenity - Godsmack 0.097744345665 4
Keep Away - Godsmack I Stand Alone - Godsmack 0.0893854498863 5
Keep Away - Godsmack Shine Down - Godsmack 0.058252453804 6
Keep Away - Godsmack Guarded (Album Version) -
Disturbed ...
0.0522388219833 7
Keep Away - Godsmack Droppin' Plates (Album
Version) - Disturbed ...
0.0454545617104 8
Keep Away - Godsmack Awake - Godsmack 0.0454545617104 9
Keep Away - Godsmack The Game (Amended
Version) - Disturbed ...
0.04040402174 10
[10 rows x 4 columns]

评估模型

In [24]:
import matplotlib.pyplot as plt
%matplotlib inline
model_performance = graphlab.recommender.util.compare_models(test_data, [popularity_model, personalized_model], user_sample=.05)
compare_models: using 2931 users to estimate model performance
PROGRESS: Evaluate model M0
recommendations finished on 1000/2931 queries. users per second: 13513.5
recommendations finished on 2000/2931 queries. users per second: 12269.9
Precision and recall summary statistics by cutoff
+--------+-----------------+------------------+
| cutoff |  mean_precision |   mean_recall    |
+--------+-----------------+------------------+
|   1    | 0.0259297168202 | 0.00694071272372 |
|   2    | 0.0252473558512 | 0.0134330745641  |
|   3    | 0.0227453656318 |  0.018406740157  |
|   4    | 0.0213237802798 | 0.0228748506128  |
|   5    | 0.0204708290686 | 0.0274927137005  |
|   6    | 0.0196178778574 | 0.0319549508265  |
|   7    | 0.0183750060925 | 0.0349309420062  |
|   8    | 0.0173575571477 | 0.0375040484176  |
|   9    | 0.0164524811403 | 0.0405147631094  |
|   10   |  0.015933128625 | 0.0427994661618  |
+--------+-----------------+------------------+
[10 rows x 3 columns]

PROGRESS: Evaluate model M1
recommendations finished on 1000/2931 queries. users per second: 12500
recommendations finished on 2000/2931 queries. users per second: 12195.1
Precision and recall summary statistics by cutoff
+--------+-----------------+-----------------+
| cutoff |  mean_precision |   mean_recall   |
+--------+-----------------+-----------------+
|   1    |  0.186966905493 | 0.0594559320604 |
|   2    |  0.155919481406 | 0.0923041238503 |
|   3    |  0.137268281588 |  0.118459454405 |
|   4    |  0.123080859775 |  0.139317867995 |
|   5    |  0.111907198908 |  0.156580338735 |
|   6    |  0.102297281929 |  0.170049700079 |
|   7    | 0.0948481746844 |  0.183654773967 |
|   8    | 0.0883230979188 |  0.195696294939 |
|   9    | 0.0831722203268 |  0.206249758112 |
|   10   | 0.0784373933811 |  0.214801976299 |
+--------+-----------------+-----------------+
[10 rows x 3 columns]

In [26]:
model_performance1 = graphlab.compare(test_data, [popularity_model, personalized_model], user_sample=0.05)
graphlab.show_comparison(model_performance1, [popularity_model, personalized_model])
compare_models: using 2931 users to estimate model performance
PROGRESS: Evaluate model M0
recommendations finished on 1000/2931 queries. users per second: 13333.3
recommendations finished on 2000/2931 queries. users per second: 12345.7
Precision and recall summary statistics by cutoff
+--------+-----------------+-----------------+
| cutoff |  mean_precision |   mean_recall   |
+--------+-----------------+-----------------+
|   1    | 0.0337768679632 | 0.0108918054159 |
|   2    | 0.0300238826339 | 0.0188657952988 |
|   3    | 0.0272944387581 | 0.0249256666606 |
|   4    |  0.026526782668 | 0.0317976017311 |
|   5    | 0.0237461617195 | 0.0350121238555 |
|   6    |  0.022233594905 | 0.0388166630746 |
|   7    |  0.021250670176 | 0.0428438359093 |
|   8    | 0.0200017059024 | 0.0457587419972 |
|   9    | 0.0192956518443 |  0.049510791911 |
|   10   | 0.0181508017741 | 0.0515811409107 |
+--------+-----------------+-----------------+
[10 rows x 3 columns]

PROGRESS: Evaluate model M1
recommendations finished on 1000/2931 queries. users per second: 11111.1
recommendations finished on 2000/2931 queries. users per second: 11299.4
Precision and recall summary statistics by cutoff
+--------+-----------------+-----------------+
| cutoff |  mean_precision |   mean_recall   |
+--------+-----------------+-----------------+
|   1    |  0.190719890822 | 0.0597096660684 |
|   2    |  0.158819515524 |  0.094827621107 |
|   3    |  0.14204480837  |  0.123848294793 |
|   4    |  0.127004435346 |  0.147071126588 |
|   5    |  0.115591948141 |  0.166497009473 |
|   6    |  0.104799272148 |  0.180650110627 |
|   7    | 0.0971389579373 |  0.193925833729 |
|   8    | 0.0900716479017 |  0.204521470796 |
|   9    | 0.0841199438948 |  0.21263606508  |
|   10   | 0.0796315250768 |  0.22331463648  |
+--------+-----------------+-----------------+
[10 rows x 3 columns]

Model compare metric: precision_recall
In [ ]: